68 research outputs found

    Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction?

    Full text link
    The prediction of material properties plays a crucial role in the development and discovery of materials in diverse applications, such as batteries, semiconductors, catalysts, and pharmaceuticals. Recently, there has been a growing interest in employing data-driven approaches by using machine learning technologies, in combination with conventional theoretical calculations. In material science, the prediction of unobserved values, commonly referred to as extrapolation, is particularly critical for property prediction as it enables researchers to gain insight into materials beyond the limits of available data. However, even with the recent advancements in powerful machine learning models, accurate extrapolation is still widely recognized as a significantly challenging problem. On the other hand, self-supervised pretraining is a machine learning technique where a model is first trained on unlabeled data using relatively simple pretext tasks before being trained on labeled data for target tasks. As self-supervised pretraining can effectively utilize material data without observed property values, it has the potential to improve the model's extrapolation ability. In this paper, we clarify how such self-supervised pretraining can enhance extrapolation performance.We propose an experimental framework for the demonstration and empirically reveal that while models were unable to accurately extrapolate absolute property values, self-supervised pretraining enables them to learn relative tendencies of unobserved property values and improve extrapolation performance

    On Data Imbalance in Molecular Property Prediction with Pre-training

    Full text link
    Revealing and analyzing the various properties of materials is an essential and critical issue in the development of materials, including batteries, semiconductors, catalysts, and pharmaceuticals. Traditionally, these properties have been determined through theoretical calculations and simulations. However, it is not practical to perform such calculations on every single candidate material. Recently, a combination method of the theoretical calculation and machine learning has emerged, that involves training machine learning models on a subset of theoretical calculation results to construct a surrogate model that can be applied to the remaining materials. On the other hand, a technique called pre-training is used to improve the accuracy of machine learning models. Pre-training involves training the model on pretext task, which is different from the target task, before training the model on the target task. This process aims to extract the input data features, stabilizing the learning process and improving its accuracy. However, in the case of molecular property prediction, there is a strong imbalance in the distribution of input data and features, which may lead to biased learning towards frequently occurring data during pre-training. In this study, we propose an effective pre-training method that addresses the imbalance in input data. We aim to improve the final accuracy by modifying the loss function of the existing representative pre-training method, node masking, to compensate the imbalance. We have investigated and assessed the impact of our proposed imbalance compensation on pre-training and the final prediction accuracy through experiments and evaluations using benchmark of molecular property prediction models

    Argobots: A Lightweight Low-Level Threading and Tasking Framework

    Get PDF
    In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach

    Schematic: A Concurrent Object-Oriented Extension to Scheme

    No full text
    A concurrent object-oriented extension to the programming language Scheme, called Schematic, is described. Schematic supports familiar constructs often used in typical parallel programs (future and higher-level macros such as plet and pbegin), which are actually defined atop a very small number of fundamental primitives. In this way, Schematic achieves both the convenience for typical concurrent programming and simplicity and flexibility of the language kernel. Schematic also supports concurrent objects which exhibit more natural and intuitive behavior than the "bare" (unprotected) shared memory, and permit more concurrency than the traditional Actor model. Schematic will be useful for intensive parallel applications on parallel machines or networks of workstations, concurrent GUI programming, distributed programming over network, and even concurrent shell programming. ANY OTHER IDENTIFYING INFORMATION OF THIS REPORT To Appear in Proceedings of Object Based Parallel and Distributed Co..

    Efficient and Reusable Implementantion of Fine-Grain Multithreading and Garbage Collection on Distributed-Memory Parallel Computers

    No full text
    報告番号: 乙13537 ; 学位授与年月日: 1997-09-22 ; 学位の種別: 論文博士 ; 学位の種類: 博士(理学) ; 学位記番号: 第13537号 ; 研究科・専攻: 理学系研究

    *+.61-/4;@??d7930:58h2icabgjlafe=k

    No full text
    This paper discusses a broad range of issues which make concurrent object-oriented programming (COOP) on distributed memory multicomputers more comfortable and efficient. The presented topics include language design, abstract machine, memory management (including garbage collection). (1) As for the language design, a variant of future is introduced so that programmers can easily specify parallelism and synchronization between them. The language is made explicitly parallel to give programmers a simple cost-model and opportunity of manual optimization, thereby achieving performance without imposing too much burden on the compiler. The expressive power of the language is demonstrated by typical synchronization codes as well as example applications. (2) Runtime implementation issues are described in terms of our proposed abstract machine, StackThreads, whereby making proposed mechanisms applicable not only to COOP languages but also to other languages such as functional languages. In Stac..

    Efficient and Reusable Implementation of Fine-Grain Multithreading and Garbage Collection on Distributed-Memory Parallel Computers

    No full text
    This thesis studies efficient runtime systems for parallelism management (multithreading) and memory management (garbage collection) on largescale distributed-memory parallel computers. Both are fundamental primitives for implementing high-level parallel programming languages that support dynamic parallelism and dynamic data structures. A distinguishing feature of the developed multithreading system is that it tolerates a large number of threads in a single CPU while allowing direct reuse of existing sequential C compilers. In fact, it is able to turn any standard C procedure call into an asynchronous one. Having such a runtime system, the compiler of a high-level parallel programming language can fork a new thread simply by a C procedure call to a corresponding C function. A thread can block its execution by calling a library procedure that saves the stack frame of the thread and unwinds stack frames. To resume a thread, StackThreads provides another runtime routine that rebuilds the..

    A Methodology for Constructing Portable and Simple Global Garbage Collectors

    No full text
    Many garbage collectors on parallel computers are written in sequential languages, therefore thay are not portable across machines with different communication primitives. Moreover, the description of garbage collectors on distributed memory machines, which use asynchronous messages, is complex. We implemented a garbage collector for parallel object-oriented language Schematic by using Schematic itself. We show that a garbage collector can be more portable and simple by describing it on the top of parallel language, which is machine-independent and equipped with high level communication constructs. We implemented a garbage collector on distrubuted memory machine AP1000, and measured its performance. 1 Introduction One of the difficult factors in constructing parallel languages is implementing global garbage collectors (garbage collectors (GC) which detect garbages which have been shared among several processors). The problems which occur in implementing garbage collectors on parallel ..
    corecore